Improving the GPU space of computation under triangular domain problems
نویسندگان
چکیده
There is a stage in the GPU computing pipeline where a grid of thread-blocks is mapped to the problem domain. Normally, this grid is a k-dimensional bounding box that covers a k-dimensional problem no matter its shape. Threads that fall inside the problem domain perform computations, otherwise they are discarded at runtime. For problems with non-square geometry, this is not always the best idea because part of the space of computation is executed without any practical use. Twodimensional triangular domain problems, alias td-problems, are a particular case of interest. Problems such as the Euclidean distance map, LU decomposition, collision detection and simulations over triangular tiled domains are all td-problems and they appear frequently in many areas of science. In this work, we propose an improved GPU mapping function g(λ), that maps any λ block to a unique location (i, j) in the triangular domain. The mapping is based on the properties of the lower triangular matrix and it works at a block level, thus not compromising thread organization within a block. The theoretical improvement from using g(λ) is upper bounded as I < 2 and the number of wasted blocks is reduced from O(n) to O(n). We compare our strategy with other proposed methods; the upper-triangular mapping (UTM), the rectangular box (RB) and the recursive partition (REC). Our experimental results on Nvidia’s Kepler GPU architecture show that g(λ) is between 12% and 15% faster than the bounding box (BB) strategy. When compared to the other strategies, our mapping runs significantly faster than UTM and it is as fast as RB in practical use, with the advantage that thread organization is not compromised, as in RB. This work also contributes at presenting, for the first time, a fair comparison of all existing strategies running the same experiments under the same hardware.
منابع مشابه
A Non-linear GPU Thread Map for Triangular Domains
There is a stage in the GPU computing pipeline where a grid of thread-blocks, in parallel space, is mapped onto the problem domain, in data space. Since the parallel space is restricted to a box type geometry, the mapping approach is typically a k-dimensional bounding box (BB) that covers a p-dimensional data space. Threads that fall inside the domain perform computations while threads that fal...
متن کاملExploiting Structure of Symmetric or Triangular Matrices on a GPU
Matrix computations are expensive, and GPUs have the potential to deliver results at reduced cost by exploiting parallel computation. We focus on dense matrices of the form ADA , where A is an m×n matrix (m ≤ n) and D is an n× n diagonal matrix. Many important numerical problems require solving linear systems of equations involving matrices of this form. These problems include normal equations ...
متن کاملFast Solving of Influence Diagrams for Multiagent Planning on GPU-enabled Architectures
Planning under uncertainty in multiagent settings is highly intractable because of history and plan space complexities. Probabilistic graphical models exploit the structure of the problem domain to mitigate the computational burden. In this paper, we introduce the first parallelization of planning in multiagent settings on a CPU-GPU heterogeneous system. In particular, we focus on the algorithm...
متن کاملMPI- and CUDA- implementations of modal finite difference method for P-SV wave propagation modeling
Among different discretization approaches, Finite Difference Method (FDM) is widely used for acoustic and elastic full-wave form modeling. An inevitable deficit of the technique, however, is its sever requirement to computational resources. A promising solution is parallelization, where the problem is broken into several segments, and the calculations are distributed over different processors. ...
متن کاملParallel Triangular Solvers on GPU
In this paper, we investigate GPU based parallel triangular solvers systematically. The parallel triangular solvers are fundamental to incomplete LU factorization family preconditioners and algebraic multigrid solvers. We develop a new matrix format suitable for GPU devices. Parallel lower triangular solvers and upper triangular solvers are developed for this new data structure. With these solv...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1308.1419 شماره
صفحات -
تاریخ انتشار 2012